Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data.

نویسندگان

  • Yanni Zhu
  • Wei Pan
  • Xiaotong Shen
چکیده

With the availability of genetic pathways or networks and accumulating knowledge on genes with variants predisposing to diseases (disease genes), we propose a disease-gene-centric support vector machine (DGC-SVM) that directly incorporates these two sources of prior information into building microarray-based classifiers for binary classification problems. DGC-SVM aims to detect the genes clustering together and around some key disease genes in a gene network. To achieve this goal, we propose a penalty over suitably defined groups of genes. A hierarchy is imposed on an undirected gene network to facilitate the definition of such gene groups. Our proposed DGC-SVM utilizes the hinge loss penalized by a sum of the L(infinity)-norm being applied to each group. The simulation studies show that DGC-SVM not only detects more disease genes along pathways than the existing standard SVM and SVM with an L(1)-penalty (L1-SVM), but also captures disease genes that potentially affect the outcome only weakly. Two real data applications demonstrate that DGC-SVM improves gene selection with predictive performance comparable to the standard-SVM and L1-SVM. The proposed method has the potential to be an effective classification tool that encourages gene selection along paths to or clustering around known disease genes for microarray data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Identification of Alzheimer disease-relevant genes using a novel hybrid method

Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...

متن کامل

A Probabilistic Neural Network for Gene Selection and Classification of Microarray Data

In this paper, we present the mathematical foundations of a probabilistic neural network for gene selection and classification of high-dimensional microarray data. We present a catalogue of features that a classification system for microarray data should incorporate. We then use this catalogue and compare the theoretical properties of probabilistic neural networks with support vector machines w...

متن کامل

Gene selection using support vector machines with non-convex penalty

MOTIVATION With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expr...

متن کامل

Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease

Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistics and its interface

دوره 2 3  شماره 

صفحات  -

تاریخ انتشار 2009